Noise robust speech recognition with state duration constraints
نویسنده
چکیده
restrictions for state-segmentations imposed by the Viterbi In this paper, we present a method to incorporate and re-estimate state duration constraints within the Maximum Likelihood training of hidden Markov models. In the recognition phase we find the optimal state sequence fulfilling the state duration constraints obtained in the training phase. Our target is to get speaker-dependent training and recognition perform well with a very small amount of training data in the case of mismatch between the training and testing environments. We take advantage of the fact that speakers tend to preserve their speaking style in similar situations (e.g. when speaking to a machine) and our main means to reach the target is to force similar state segmentations in the training and recognition phases. We show that with the proposed method we can substantially improve the robustness of a speech recognizer and decrease the error rates by over 93% when compared with a standard approach.
منابع مشابه
Improving the performance of MFCC for Persian robust speech recognition
The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...
متن کاملContext-dependent word duration modelling for robust speech recognition
Conventional hidden Markov models (HMMs) have weak duration constraints. This may cause the decoder to produce word matches with unrealistic durations in noisy situations. This paper describes techniques for modelling context-dependent word duration cues and incorporating them directly in a multi-stack decoding algorithm. The proposed model is capable of penalising duration constraints of a wor...
متن کاملروشی جدید در بازشناسی مقاوم گفتار مبتنی بر دادگان مفقود با استفاده از شبکه عصبی دوسویه
Performance of speech recognition systems is greatly reduced when speech corrupted by noise. One common method for robust speech recognition systems is missing feature methods. In this way, the components in time - frequency representation of signal (Spectrogram) that present low signal to noise ratio (SNR), are tagged as missing and deleted then replaced by remained components and statistical ...
متن کاملRobust connected word speech recognition using weighted viterbi algorithm and context-dependent temporal constraints
This paper addresses the problem of connected word speech recognition with signals corrupted by additive and convolutional noise. Context-dependent temporal constraints are proposed and compared with the ordinary temporal restrictions, and used in combination with the weighted Viterbi algorithm which had been tested with isolated word recognition experiments in previous papers. Connected-word r...
متن کاملSpeaker dependent temporal constraints combined with speaker independent HMM for speech recognition in noise
This paper addresses the problem of speech recognition in noise using speaker-dependent temporal constraints in the Viterbi algorithm in combination with speaker-independent HMM. It is shown that the speaker-dependent re-estimation of state duration parameters requires a low computational load and a small training database, and can lead to reductions in the error rate as high as 30% or 40% with...
متن کامل